Picture for Haoyang Zhang

Haoyang Zhang

Tony

The WER Trap: Shattering the Illusion of Unified Tokens in Speech Language Models

Add code
May 28, 2026
Viaarxiv icon

StepAudio 2.5 Technical Report

Add code
May 22, 2026
Viaarxiv icon

DuplexSLA: A Full-Duplex Spoken Language Model with Synchronized Speech, Language, and Action

Add code
May 20, 2026
Viaarxiv icon

Boosting Omni-Modal Language Models: Staged Post-Training with Visually Debiased Evaluation

Add code
May 13, 2026
Viaarxiv icon

Step-Audio-R1.5 Technical Report

Add code
Apr 28, 2026
Viaarxiv icon

From Procedural Skills to Strategy Genes: Towards Experience-Driven Test-Time Evolution

Add code
Apr 16, 2026
Viaarxiv icon

Dr.Occ: Depth- and Region-Guided 3D Occupancy from Surround-View Cameras for Autonomous Driving

Add code
Mar 05, 2026
Viaarxiv icon

CoWork-X: Experience-Optimized Co-Evolution for Multi-Agent Collaboration System

Add code
Feb 04, 2026
Viaarxiv icon

AlignDrive: Aligned Lateral-Longitudinal Planning for End-to-End Autonomous Driving

Add code
Jan 05, 2026
Viaarxiv icon

DepFlow: Disentangled Speech Generation to Mitigate Semantic Bias in Depression Detection

Add code
Jan 01, 2026
Viaarxiv icon